166

Applications in Computer Vision

    

    

FIGURE 6.8

(a) and (b) illustrate the distribution of the unbinarized weights wi of the 6-th 1-bit layer

in 1-bit PointNet backbone when trained under XNOR-Net and our POEM, respectively.

From left to right, we report the weight distribution of initialization, 40-th, 80-th, 120-th,

160-th, and 200-th epoch. Our POEM obtains an apparent bimodal distribution, which is

much more robust.

Weight distribution: The POEM-based model is based on an Expectation-Maximization

process implemented in PyTorch [186] platform. We compare the weight distribution of

training XNOR-Net and POEM, which can subtly confirm our motivation. For a 1-bit

PointNet model, we analyze the 6-th 1-bit layer sized (64, 64) and having 4096 elements.

We plot its weight distribution at the {0, 40, 60, 120, 160, 200}-th epochs. Figure 6.8 shows

that the initialization (0-th epoch) is the same for XNOR-Net and POEM. However, our

POEM efficiently employs the Expectation-Maximization algorithm to supervise the back-

propagation process, leading to an effective and robust bimodal distribution. This analysis

also complies with the performance comparison in Table 6.5.

6.4

LWS-Det: Layer-Wise Search for 1-bit Detectors

The performance of 1-bit detectors typically degrades to the point where they are not widely

deployed on real-world embedded devices. For example, BiDet [240] only achieves 13.2%

mAP@[.5, .95] on the COCO minival dataset [145], resulting in an accuracy gap of 10.0%

below its real value counterpart (on the SSD300 framework). The reason, we believe, lies in

the fact that the layer-wise binarization error significantly affects 1-bit detector learning.

TABLE 6.3

The effects of different components of POEM on OA.

1-bit PointNet

OA (%)

XNOR-Net

81.9

Proposed baseline network

83.1

Proposed baseline network + PReLU

85.0

Proposed baseline network + EM

86.2

Proposed baseline network + LSF

86.5

Proposed baseline network + PReLU + EM + LSF (POEM)

90.2

Real-valued Counterpart

89.2

Note: PReLU, EM, and LSF denote components that are introduced into our proposed

baseline network. The proposed baseline network + PReLU + EM + LSF denotes the

POEM we propose. LSF denotes the learnable scale factor, in short.